24 research outputs found

    A new penalty term for the BIC with respect to speaker diarization

    Get PDF
    In this paper we examine a new penalty term for the Bayesian Information Criterion (BIC) that is suited to the problem of speaker diarization. Based on our previous approach of penalizing each cluster only with its effective sample size - an approach we called segmental - we propose a stricter penalty term. The criterion we derive retains the main property of the Segmental-BIC, i.e. it approximates the evidence of overall partitions of the data and simultaneously leads to a pairwise dissimilarity measure that is completely defined by the pair of clusters in question. The experimental results show significant improvement in diarization accuracy on the ESTER benchmark

    Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling

    Full text link
    The study of speech disorders can benefit greatly from time-aligned data. However, audio-text mismatches in disfluent speech cause rapid performance degradation for modern speech aligners, hindering the use of automatic approaches. In this work, we propose a simple and effective modification of alignment graph construction of CTC-based models using Weighted Finite State Transducers. The proposed weakly-supervised approach alleviates the need for verbatim transcription of speech disfluencies for forced alignment. During the graph construction, we allow the modeling of common speech disfluencies, i.e. repetitions and omissions. Further, we show that by assessing the degree of audio-text mismatch through the use of Oracle Error Rate, our method can be effectively used in the wild. Our evaluation on a corrupted version of the TIMIT test set and the UCLASS dataset shows significant improvements, particularly for recall, achieving a 23-25% relative improvement over our baselines.Comment: Interspeech 202

    Investigating Personalization Methods in Text to Music Generation

    Full text link
    In this work, we investigate the personalization of text-to-music diffusion models in a few-shot setting. Motivated by recent advances in the computer vision domain, we are the first to explore the combination of pre-trained text-to-audio diffusers with two established personalization methods. We experiment with the effect of audio-specific data augmentation on the overall system performance and assess different training strategies. For evaluation, we construct a novel dataset with prompts and music clips. We consider both embedding-based and music-specific metrics for quantitative evaluation, as well as a user study for qualitative evaluation. Our analysis shows that similarity metrics are in accordance with user preferences and that current personalization approaches tend to learn rhythmic music constructs more easily than melody. The code, dataset, and example material of this study are open to the research community.Comment: Submitted to ICASSP 2024, Examples at https://zelaki.github.io

    Enabling the human in the loop: Linked data and knowledge in industrial cyber-physical systems

    Get PDF
    Industrial Cyber-Physical Systems have benefitted substantially from the introduction of a range of technology enablers. These include web-based and semantic computing, ubiquitous sensing, internet of things (IoT) with multi-connectivity, advanced computing architectures and digital platforms, coupled with edge or cloud side data management and analytics, and have contributed to shaping up enhanced or new data value chains in manufacturing. While parts of such data flows are increasingly automated, there is now a greater demand for more effectively integrating, rather than eliminating, human cognitive capabilities in the loop of production related processes. Human integration in Cyber-Physical environments can already be digitally supported in various ways. However, incorporating human skills and tangible knowledge requires approaches and technological solutions that facilitate the engagement of personnel within technical systems in ways that take advantage or amplify their cognitive capabilities to achieve more effective sociotechnical systems. After analysing related research, this paper introduces a novel viewpoint for enabling human in the loop engagement linked to cognitive capabilities and highlighting the role of context information management in industrial systems. Furthermore, it presents examples of technology enablers for placing the human in the loop at selected application cases relevant to production environments. Such placement benefits from the joint management of linked maintenance data and knowledge, expands the power of machine learning for asset awareness with embedded event detection, and facilitates IoT-driven analytics for product lifecycle management

    Innovative Applications of Natural Language Processing and Digital Media in Theatre and Performing Arts

    Get PDF
    The objective of our research is to investigate new digital techniques and tools, offering the audience innovative, attractive, enhanced and accessible experiences. The project focuses on performing arts, particularly theatre, aiming at designing, implementing, experimenting and evaluating technologies and tools that expand the semiotic code of a performance by offering new opportunities and aesthetic means in stage art and by introducing parallel accessible narrative flows. In our novel paradigm, modern technologies emphasize the stage elements providing a multilevel, intense and immersive theatrical experience. Moreover, lighting, video projections, audio clips and digital characters are incorporated, bringing unique aesthetic features. We also attempt to remove sensory and language barriers faced by some audiences. Accessibility features consist of subtitles, sign language and audio description. The project emphasises on natural language processing technologies, embedded communication and multimodal interaction to monitor automatically the time flow of a performance. Based on this, pre-designed and directed stage elements are being mapped to appropriate parts of the script and activated automatically by using the virtual "world" and appropriate sensors, while accessibility flows are dynamically synchronized with the stage action. The tools above are currently adapted within two experimental theatrical plays for validation purposes. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.</p

    Context-based and human-centred information fusion in diagnostics

    Get PDF
    Maintenance management and engineering practice has progressed to adopt approaches which aim to reach maintenance decisions not by means of pre-specified plans and recommendations but increasingly on the basis of best contextually relevant available information and knowledge, all considered against stated objectives. Different methods for automating event detection, diagnostics and prognostics have been proposed, which may achieve very high performance when appropriately adapted and tuned to serve the needs of well defined tasks. However, the scope of such solutions is often narrow and without a mechanism to include human contributed intervention and knowledge contribution. This paper presents a conceptual framework of integrating automated detection and diagnostics and human contributed knowledge in a single architecture. This is instantiated by an e-maintenance platform comprising tools for both lower level information fusion as well as for handling higher level knowledge. Well structured maintenance relationships, such as those present in a typical FMECA study, as well as on the job human contributed compact knowledge are exploited to this end. A case study presenting the actual workflow of the process in an industrial setting is employed to pilot test the approach

    Educational technology - Special Issue of ERCIM News

    No full text
    International audienc
    corecore